Efficiency-Quality Tradeoffs for Vector Score Aggregation

نویسندگان

  • Pavan Kumar C. Singitham
  • Mahathi S. Mahabhashyam
  • Prabhakar Raghavan
چکیده

Finding the ` nearest neighbors to a query in a vector space is an important primitive in text and image retrieval. Here we study an extension of this problem with applications to XML and image retrieval: we have multiple vector spaces, and the query places a weight on each space. Match scores from the spaces are weighted by these weights to determine the overall match between each record and the query; this is a case of score aggregation. We study approximation algorithms that use a small fraction of the computation of exhaustive search through all records, while returning nearly the best matches. We focus on the tradeoff between the computation and the quality of the results. We develop two approaches to retrieval from such multiple vector spaces. The first is inspired by resource allocation. The second, inspired by computational geometry, combines the multiple vector spaces together with all possible query weights into a single larger space. While mathematically elegant, this abstraction is intractable for implementation. We therefore devise an approximation of this combined space. Experiments show that all our approaches (to varying extents) enable retrieval quality comparable to exhaustive search, while avoiding its heavy computational cost. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004 1 Overview: score aggregation We have n records E = {e1, e2, . . . , en} and s sources of evidence. For 1 ≤ i ≤ s, we have a source score σi(ej) from source i for record ej. Additionally, we have a positive real weight wi for each of the s sources. For a specified positive integer `, we seek the ` records of highest aggregate score defined as

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Axiomatic and Computational Aspects of Scoring Allocation Rules for Indivisible Goods

We define a family of rules for dividing m indivisible goods among agents, parameterized by a scoring vector and a social welfare aggregation function. We assume that agents’ preferences over sets of goods are additive, but that the input is ordinal: each agent simply ranks single goods. Similarly to (positional) scoring rules in voting, a scoring vector s= (s1, . . . ,sm) consists of m nonincr...

متن کامل

Scoring Rules for the Allocation of Indivisible Goods

We define a family of rules for dividing m indivisible goods among agents, parameterized by a scoring vector and a social welfare aggregation function. We assume that agents’ preferences over sets of goods are additive, but that the input is ordinal: each agent simply ranks single goods. Similarly to (positional) scoring rules in voting, a scoring vector s = (s1, . . . ,sm) consists of m noninc...

متن کامل

Scoring Rules for the Allocation of Indivisible Goods1

We define a family of rules for dividing m indivisible goods among agents, parameterized by a scoring vector and a social welfare aggregation function. We assume that agents’ preferences over sets of goods are additive, but that the input is ordinal: each agent simply ranks single goods. Similarly to (positional) scoring rules in voting, a scoring vector s= (s1, . . . ,sm) consists of m nonincr...

متن کامل

Privacy and Efficiency Tradeoffs for Multiword Top K Search with Linear Additive Rank Scoring

This paper proposes a private ranking scheme with linear additive scoring for efficient top K keyword search on modest-sized cloud datasets. This scheme strikes for tradeoffs between privacy and efficiency by proposing single-round client-server collaboration with server-side partial ranking based on blinded feature weights with random masks. Client-side preprocessing includes query decompositi...

متن کامل

Using Imperialist competitive algorithm optimization in multi-response nonlinear programming

The quality of manufactured products is characterized by many controllable quality factors. These factors should be optimized to reach high quality products. In this paper we try to find the controllable factors levels with minimum deviation from the target and with a least variation. To solve the problem a simple aggregation function is used to aggregate the multiple responses functions then a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004